EU Employment Analysis (EUROSTAT Data)

Davide Bittelli

December 8, 2024


1 Introduction

This project provides an in-depth analysis of labor force statistics sourced from the Eurostat lfsa_egan dataset. The dataset includes a wealth of information on labor market participation, broken down by demographic factors such as sex, age, citizenship, and geographical regions across European countries. The goal is to explore how these factors influence employment trends over time and how they vary across different countries and regions.

In addition to the statistical analysis, the project also includes a geospatial analysis of labor market trends for the year 2023. This analysis uses interactive maps to visualize employment patterns across different European countries and regions.

The analysis focuses on several key aspects, including the evolution of Employment Trends, with particular attention to the differences between citizenship groups, age categories, and genders. Additionally, the project explores employment disparities by gender, as well as the relationship between employment levels and both gender and citizenship categories.

Through the use of R’s interactive visualizations, the project further compares employment patterns before and after the EU enlargements in 2000 and 2007, as well as the impacts of two significant economic crises in 2010 and 2023. The findings aim to provide valuable insights into the changing landscape of the European labor market, highlighting the factors that contribute to employment dynamics and the ongoing challenges in achieving gender equality in labor force participation.


2 Loading necessary Libraries

Loading the necessary libraries for data manipulation, analysis, and visualization.

# Load necessary libraries
library(tidyverse)     
library(eurostat)      
library(FactoMineR)   
library(factoextra)   
library(ggplot2)
library(dplyr)
library(forcats)
library(tidyverse)
library(lubridate)
library(plotly)
library(scales)  
library(patchwork) 
library(gridExtra)  
library(highcharter)
library(knitr)
library(kableExtra)
library(htmltools)
library(htmlwidgets)

3 Data Import and Exploration

The lfsa_egan dataset was obtained from EUROSTAT’s official website. This dataset belongs to the Labour Force Survey (LFS) and contains annual employment data categorized by different demographics and economic dimensions.
- Dataset: 496671 observations and 8 variables.
- Types of variables: it mainly contains categorical variables and one numerical variable.

library(eurostat) 

data <- get_eurostat("lfsa_egan")
## indexed 0B in  0s, 0B/sindexed 1.00TB in  0s, 55.08TB/s                                                                              

In order to have a better understanding of the dataset, a brief description for each variable and the corresponding unique values are provided.

Variable Description Type Unique_values
TIME_PERIOD Sampling year date 1995-01-01 1996-01-01 1997-01-01 1998-01-01 1999-01-01 2000-01-01 2001-01-01 2002-01-01 2003-01-01 2004-01-01 2005-01-01 2006-01-01 2007-01-01 2008-01-01 2009-01-01 2010-01-01 2011-01-01 2012-01-01 2013-01-01 2014-01-01 2015-01-01 2016-01-01 2017-01-01 2018-01-01 2019-01-01 2020-01-01 2021-01-01 2022-01-01 2023-01-01
geo Geopolitical entity chr AT BE CH CY CZ DE DK EA20 EE EL ES EU27_2020 FI FR HU IE IS IT LU ME MT NL NO PT RS SE SI SK UK BG HR LT LV MK PL RO BA TR
citizen Country of citizienship chr EU27_2020_FOR FOR NAT NEU27_2020_FOR NRP STLS TOTAL
age Age ranges chr Y15-19 Y15-24 Y15-39 Y15-59 Y15-64 Y15-74 Y20-24 Y20-64 Y25-29 Y25-49 Y25-54 Y25-59 Y25-64 Y25-74 Y30-34 Y35-39 Y40-44 Y40-59 Y40-64 Y45-49 Y50-54 Y50-59 Y50-64 Y50-74 Y55-59 Y55-64 Y60-64 Y65-69 Y65-74 Y70-74 Y_GE15 Y_GE25 Y_GE50 Y_GE65 Y_GE75
unit Measurement’s unit for employment chr THS_PER (Thousand persons)
sex Gender chr F (Female) M (Male) T (Total)
freq Sampling frequency chr A (Annual)
values Values for each variable’s combination num numerical values

And a description of each Geopolitical Entity code is provided:

Code Country Name
AT Austria
BE Belgium
CH Switzerland
CY Cyprus
CZ Czech Republic
DE Germany
DK Denmark
EE Estonia
EL Greece
ES Spain
FI Finland
FR France
HU Hungary
IE Ireland
IS Iceland
IT Italy
LU Luxembourg
ME Montenegro
MT Malta
NL Netherlands
NO Norway
PT Portugal
RS Serbia
SE Sweden
SI Slovenia
SK Slovakia
UK United Kingdom
BG Bulgaria
HR Croatia
LT Lithuania
LV Latvia
MK North Macedonia
PL Poland
RO Romania
BA Bosnia and Herzegovina
TR Turkey
EU27_2020 European Union member states as of 2020, encompassing all 27 EU countries.
EA20 Euro Area (20 countries). It refers to EU member states that have adopted the euro as their currency by 2023

4 Exploratory Data Analysis (EDA)

The dataset contains 1 continuous and 7 categorical variables. Out of these, two variables present only one category, so there is no need to explore them. Some interactive visualizations are displayed, showing the distribution of Missing Values and the following categorical variables: sex, age, geo, citizen, and TIME_PERIOD. Users can zoom in on the plot to better visualize the values when they are few.


6 Geo-Spatial Analysis

Another dataset from EUROSTAT was used for this analysis: the dataset lfsi_emp_a. This dataset contains similar data to the previous one, but with additional information, specifically the percentage of the working population (PC_POP). This added variable provides a deeper insight into the composition of the workforce across different countries and years.

data2 <- get_eurostat("lfsi_emp_a")
## indexed 0B in  0s, 0B/sindexed 1.00TB in  0s, 340.25TB/s                                                                              

The percentage of the working population (Y15-64) for 2023 is displayed in the following plot.

countries <- c(
  "AT", "BE", "BG", "HR", "CY", "CZ", "DK", "EE", "FI", "FR", "DE", "EL", "HU", "IE", "IT", "LV", "LT", "LU", "MT", "NL", "PL", "PT", "RO", "SK", "SI", "ES", "SE", "CH", "IS", "ME", "NO", "RS", "UK", "MK", "BA", "TR")

df <- data2 %>%
  filter(age == "Y15-64",
         geo %in% countries,
         indic_em == "EMP_LFS",
         sex == "T",
         unit == "PC_POP") %>%
         mutate(year = year(TIME_PERIOD)) %>%
  select(geo, year, values)

# Define color classes (example, adjust ranges accordingly)
dta_clss <- list(
  list(from = 0, to = 55, color = "#EBEBEB", name = "<55"),
  list(from = 55, to = 65, color = "#BFD0E4", name = "55-65"),
  list(from = 65, to = 75, color = "#7FA1C9", name = "65-75"),
  list(from = 75, to = 82, color = "#4073AF", name = "75-82"),
  list(from = 82, to = 100, color = "#003776", name = ">82")
)

# Function to generate the map for a specific year
generate_map <- function(year_selected) {
  
  # Filter the data for the selected year
  df_year <- df %>%
    filter(year == year_selected)
  
  # Generate the map
  hc <- hcmap("custom/europe", 
              data = df_year,
              joinBy = c("iso-a2", "geo"),  
              name = "Employment Rate",  
              value = "values",  
              tooltip = list(pointFormat = "{point.value}%"),  
              dataLabels = list(
                enabled = TRUE, 
                format = "{point.value}%"  # Display name and value with "%"
              )) %>%
    hc_colorAxis(dataClassColor = "category", 
                 dataClasses = dta_clss) %>%
    hc_title(text = paste("Employment Rate, ", year_selected)) %>%  # Dynamically update title
    hc_subtitle(text = "(% of people aged 15-64)")
  
  return(hc)
}

# Generate the initial map for 2020 (or any other starting year)
generate_map(2023)

8 Employment’s comparison by Country and Citizienship Category

A comparative analysis was conducted to explore how employment levels vary between different countries for specific citizenship categories.

For this analysis, a subset was created from the initial dataset, including only the Working Population (Y15-64) with no differentiation by gender (T). Specific years were selected for analysis:

  • 2000: Represents the early 2000s, before major EU enlargements.
  • 2007: Captures the effects after major EU enlargements.
  • 2010: Captures employment trends post-2008 Financial Crisis.
  • 2023: Reflects the latest available data, including post-COVID-19 recovery.

Heatmaps were generated for each of these years, and two comparisons were made.

8.1 Heatmaps Comparison - Before and After the EU Enlargements (2000 and 2007)

Heatmaps for 2000 and 2007 were generated and compared to capture the differences in employment levels across countries before and after the EU enlargements. All EU countries have been inserted and EU-major aggregates (EU27_2020 and EA20) have been excluded. Aggregates for Citizen groups were removed to avoid double counting.

library(dplyr)
library(plotly)

  # Filter data for selected years and only 'Total' for sex and working-age population (15-64)
  selected_years <- c("2000-01-01", "2007-01-01")
  filtered_data <- data %>%
    filter(
      TIME_PERIOD %in% as.Date(selected_years),
      sex == "T",
      age == "Y15-64",
      !citizen %in% c("EU27_2020_FOR", "NEU27_2020_FOR", "TOTAL"),
      !geo %in% c("EU27_2020", "EA20")  # Exclude "EU27_2020" and "EA20"
    )
  
  # Group employment by country, citizenship, and year
  country_citizen_summary <- filtered_data %>%
    group_by(geo, citizen, TIME_PERIOD) %>%
    summarise(total_employment = sum(values, na.rm = TRUE))
  
  # Function to create a plotly heatmap for a specific year
  create_plotly_heatmap <- function(year, showlegend = TRUE) {
    df <- country_citizen_summary %>% filter(TIME_PERIOD == as.Date(year))
    
    # Multiply total_employment by 1,000 to reflect real values
    df <- df %>% mutate(total_employment_real = total_employment * 1000)
    
    plot_ly(
      data = df,
      x = ~geo,
      y = ~citizen,
      z = ~total_employment_real,
      type = "heatmap",
      colors = colorRamp(c("#f0f9ff", "#003399")),
      colorbar = list(
        title = "<b>Employment</b>",
        tickfont = list(size = 12),
        titlefont = list(size = 14, family = "Arial")
      ),
      showscale = showlegend,   # Control legend display
      zmin = 0, zmax = 50000000,   # Set consistent color scale limits
      
      # Custom hover text
        hovertemplate = paste(
          "<b>Country:</b> %{x}<br>",
          "<b>Citizenship:</b> %{y}<br>",
          "<b>Total Employment:</b> %{z}<br>",
          "<extra></extra>"  # Removes default trace info
        )
    ) %>%
      layout(
        title = list(
          #text = paste("<b>Employment by Country and Citizenship -", format(as.Date(year), "%Y"), "</b>"),
          font = list(size = 18, family = "Arial"),
          x = 0.5,  # Center the title
          xanchor = "center"
        ),
        xaxis = list(
          title = "<b>Country</b>",
          tickangle = 45,
          tickfont = list(size = 10),
          titlefont = list(size = 14, family = "Arial")
        ),
        yaxis = list(
          title = "<b>Citizenship Category</b>",
          tickfont = list(size = 10),
          titlefont = list(size = 14, family = "Arial")
        ),
        margin = list(t = 60, b = 60)  # Add padding to top and bottom margins
      )
  }
  
  # Create plotly heatmaps for each selected year
  interactive_heatmap_2000 <- create_plotly_heatmap("2000-01-01", showlegend = TRUE)
  interactive_heatmap_2007 <- create_plotly_heatmap("2007-01-01", showlegend = FALSE)

  # Combine the interactive heatmaps vertically
  combined_interactive_heatmaps <- subplot(
    interactive_heatmap_2000,
    interactive_heatmap_2007,
    nrows = 2,
    shareX = TRUE,
    shareY = TRUE,
    titleX = TRUE,
    titleY = TRUE
  ) %>%
    layout(
      annotations = list(
        list(
          text = "<b>Year: 2000 (before EU enlargements)</b>",
          x = 0.5,
          y = 1.06,
          xref = "paper",
          yref = "paper",
          showarrow = FALSE,
          font = list(size = 11, family = "Arial")
        ),
        list(
          text = "<b>Year: 2007 (after EU enlargements)</b>",
          x = 0.5,
          y = 0.50,
          xref = "paper",
          yref = "paper",
          showarrow = FALSE,
          font = list(size = 11, family = "Arial")
        )
      ),
      title = "<b>Employment Trends by Citizenship Category and Country (2000 vs 2007)</b>",
      margin = list(l = 100, r = 50, t = 80, b = 100),
      titlefont = list(size = 20, family = "Arial", color = "black")
    )
  
  # Display the combined interactive heatmaps
  combined_interactive_heatmaps

The heatmaps for the years 2000 and 2007 show notable variations in employment levels across different EU countries and citizenship categories. The color intensity represents employment levels, with darker blue indicating higher employment.

8.1.1 Year: 2000 (Before EU Enlargements)

  • Many countries present Missing Values for one or more citizen groups, indicating possible issues with the way data were collected during that year.
  • Employment for nationals (NAT) dominates in most countries, reflecting a labor market primarily consisting of native citizens.
  • The employment distribution is more concentrated in Western European countries like Germany (DE), France (FR), and the UK.

8.1.2 Year: 2007 (After EU Enlargements)

  • Noticeable increase in employment for foreign-born category, reflecting the impact of the 2004 EU enlargement (when 10 new countries joined the EU) and anticipation of the 2007 enlargement (including Bulgaria and Romania).
  • Employment growth in countries like Germany (DE), the UK, and Spain (ES) suggests increased labor migration and workforce integration of foreign-born citizens.
  • Some Eastern European countries begin to show increased employment for nationals and foreign-born category, indicating the effect of free movement of workers within the EU.
  • Missing values are less dominant compared to 2000.

8.2 Heatmaps Comparison - After 2 Important Crisis (2010 and 2023)

Heatmaps for 2010 and 2023 were compared to capture the differences in employment levels across countries after two important crises: the 2008 Financial Crisis and the COVID-19 Pandemic.

library(dplyr)
library(plotly)

  # Filter data for selected years and only 'Total' for sex and working-age population (15-64)
  selected_years <- c("2010-01-01", "2023-01-01")
  filtered_data <- data %>%
    filter(
      TIME_PERIOD %in% as.Date(selected_years),
      sex == "T",
      age == "Y15-64",
      !citizen %in% c("EU27_2020_FOR", "NEU27_2020_FOR", "TOTAL"),
      !geo %in% c("EU27_2020", "EA20")  # Exclude "EU27_2020" and "EA20"
    )
  
  # Group employment by country, citizenship, and year
  country_citizen_summary <- filtered_data %>%
    group_by(geo, citizen, TIME_PERIOD) %>%
    summarise(total_employment = sum(values, na.rm = TRUE))
  
  # Function to create a plotly heatmap for a specific year
  create_plotly_heatmap <- function(year, showlegend = TRUE) {
    df <- country_citizen_summary %>% filter(TIME_PERIOD == as.Date(year))
    
    # Multiply total_employment by 1,000 to reflect real values
    df <- df %>% mutate(total_employment_real = total_employment * 1000)
    
    plot_ly(
      data = df,
      x = ~geo,
      y = ~citizen,
      z = ~total_employment_real,
      type = "heatmap",
      colors = colorRamp(c("#f0f9ff", "#003399")),
      colorbar = list(
        title = "<b>Employment</b>",
        tickfont = list(size = 12),
        titlefont = list(size = 14, family = "Arial")
      ),
      showscale = showlegend,   # Control legend display
      zmin = 0, zmax = 50000000,   # Set consistent color scale limits
      
      # Custom hover text
        hovertemplate = paste(
          "<b>Country:</b> %{x}<br>",
          "<b>Citizenship:</b> %{y}<br>",
          "<b>Total Employment:</b> %{z}<br>",
          "<extra></extra>"  # Removes default trace info
        )
    ) %>%
      layout(
        title = list(
          #text = paste("<b>Employment by Country and Citizenship -", format(as.Date(year), "%Y"), "</b>"),
          font = list(size = 18, family = "Arial"),
          x = 0.5,  # Center the title
          xanchor = "center"
        ),
        xaxis = list(
          title = "<b>Country</b>",
          tickangle = 45,
          tickfont = list(size = 10),
          titlefont = list(size = 14, family = "Arial")
        ),
        yaxis = list(
          title = "<b>Citizenship Category</b>",
          tickfont = list(size = 10),
          titlefont = list(size = 14, family = "Arial")
        ),
        margin = list(t = 60, b = 60)  # Add padding to top and bottom margins
      )
  }
  
  # Create plotly heatmaps for each selected year
  interactive_heatmap_2010 <- create_plotly_heatmap("2010-01-01", showlegend = TRUE)
  interactive_heatmap_2023 <- create_plotly_heatmap("2023-01-01", showlegend = FALSE)

  # Combine the interactive heatmaps vertically
  combined_interactive_heatmaps <- subplot(
    interactive_heatmap_2010,
    interactive_heatmap_2023,
    nrows = 2,
    shareX = TRUE,
    shareY = TRUE,
    titleX = TRUE,
    titleY = TRUE
  ) %>%
    layout(
      annotations = list(
        list(
          text = "<b>Year: 2010 (after 2008 Financial Crisis)</b>",
          x = 0.5,
          y = 1.06,
          xref = "paper",
          yref = "paper",
          showarrow = FALSE,
          font = list(size = 11, family = "Arial")
        ),
        list(
          text = "<b>Year: 2023 (after COVID-19 Pandemic)</b>",
          x = 0.5,
          y = 0.50,
          xref = "paper",
          yref = "paper",
          showarrow = FALSE,
          font = list(size = 11, family = "Arial")
        )
      ),
      title = "<b>Employment Trends by Citizenship Category and Country (2010 vs 2023)</b>",
      margin = list(l = 100, r = 50, t = 80, b = 100),
      titlefont = list(size = 20, family = "Arial", color = "black")
    )
  
  # Display the combined interactive heatmaps
  combined_interactive_heatmaps

The heatmaps for the years 2010 and 2023 show variations in employment levels across different EU countries and citizenship categories. The color intensity represents employment levels, with darker blue indicating higher employment.

8.2.1 Year: 2010 (After 2008 Financial Crisis)

  • Employment for nationals (NAT) remains prominent, but declines (compared to the previous year) are observed in some countries due to the impact of the 2008 financial crisis.
  • Lower employment levels for foreign-born category, reflecting the economic downturn’s disproportionate impact on migrant workers.
  • Employment growth is still concentrated in countries like Germany (DE), United Kingdom (UK) and France (FR), which were more resilient to the crisis.
  • Southern European countries such as Spain (ES) and Italy (IT) show noticeable declines (compared to the previous year), consistent with the severe economic challenges they faced during this period.

8.2.2 Year: 2023 (After COVID-19 Pandemic)

  • Recovery in employment levels for both nationals (NAT) and foreign-born category (FOR), indicating the labor market rebound after the COVID-19 pandemic.
  • Significant employment growth for foreign-born category in countries like Germany (DE), France (FR), and Spain (ES), suggesting a return to pre-pandemic trends and increased workforce integration.
  • Eastern European countries show varying employment levels, reflecting ongoing economic adjustments and workforce mobility within the EU.
  • New Missing Values appear for United Kingdom (UK) and for North Macedonia (MK), due to UK’s decision to leave European Union on 2020 and because North Macedonia implemented Regulation which introduced significant changes to the Labor Force Survey (LFS).

9 Conclusions

This analysis of labor force data from Eurostat highlights key trends in employment across Europe, focusing on sex, age, geopolitical entity and citizenship. Employment trends show fluctuations driven by events like EU enlargements and economic crises, particularly impacting foreign nationals. Gender disparities persist, though employment levels for both males and females have grown over time.

Geographical differences are evident, with countries like Germany and France contributing significantly to total employment. The analysis of citizenship groups reveals a notable increase in employment among non-EU nationals, especially post-2010. Overall, the findings underscore the importance of addressing employment inequalities and adapting to global shifts in the labor market.


10 References

The following sources were consulted and referenced throughout this analysis:

  1. Eurostat. (2024). Labor Force Survey (lfsa_egan). Available at: https://ec.europa.eu/eurostat/databrowser/view/LFSA_EGAN/default/table?lang=en

  2. Eurostat. (2024). Employment and activity - LFS adjusted series (lfsi_emp_a). Available at: https://ec.europa.eu/eurostat/databrowser/view/lfsi_emp_a/default/table?lang=en

  3. European Commission. (2020). The European Union. Available at: https://european-union.europa.eu/easy-read_en

  4. Jonathan Regenstein (2022). Highcharts for R users. Available at: https://www.highcharts.com/blog/tutorials/highcharts-for-r-users/